In [19]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
In [2]:
A = np.eye(5)
print(A)
In this part of this exercise, you will implement linear regression with one variable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities.
You would like to use this data to help you select which city to expand to next. The file ex1data1.txt contains the dataset for our linear regression prob- lem. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates a loss.
Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). (Many other problems that you will encounter in real life are multi-dimensional and can't be plotted on a 2-d plot.)
In [5]:
datafile = 'ex1\\ex1data1.txt'
df = pd.read_csv(datafile, header=None, names=['Population', 'Profit'])
In [240]:
def plot_data(x, y):
plt.figure(figsize=(10, 6))
plt.plot(x, y, '.', label='Training Data')
plt.xlabel("Population of City in 10,000s")
plt.ylabel("Profit in $10,000s")
In [241]:
plot_data(df['Population'], df['Profit'])
In this part, you will fit the linear regression parameters $\theta$ to our dataset using gradient descent.
The objective of linear regression is to minimize the cost function
$$ J\left( \theta \right) = \frac{1}{2m} \sum_{i=1}^m \left( h_\theta \left( x^{\left( i\right)} \right) - y^{\left( i \right)} \right)^2 $$where $h_\theta\left( x \right)$ is the hypothesis given by the linear model
$$ h_\theta\left( x \right) = \theta^\intercal x = \theta_0 + \theta_1 x_1 $$Recall that the parameters of your model are the $\theta_j$ values. These are the values you will adjust to minimize cost $J(\theta)$. One way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update
$$ \theta_j := \theta_j - \alpha\frac{1}{m}\sum_{i=1}^m \left( h_\theta\left( x^{\left( i\right)} \right) - y^{\left(i\right)}\right) x_j^{\left(i\right)} \;\;\;\;\;\;\;\;\;\; \text{simultaneously update } \theta_j \text{ for all } j \text{.} $$With each step of gradient descent, your parameters $\theta_j$ come closer to the optimal values that will achieve the lowest cost $J(\theta)$.
In the following lines, we add another dimension to our data to accommodate the $\theta_0$ intercept term.
In [59]:
# set the number of training examples
m = len(df['Population'])
In [102]:
# create an array from the dataframe (missing column for x_0 values)
X = df['Population'].values
# add in the first column of the array for x_0 values
X = X[:, np.newaxis]
X = np.insert(X, 0, 1, axis=1)
Out[102]:
In [143]:
y = df['Profit'].values
y = y[:, np.newaxis]
Let's make the (totally random) guess that $\theta_0$ = 0 and $\theta_1$ = 0. In that case, we have the following output from the hypothesis function.
In [117]:
theta_values = np.array([[0.], [0]])
print(theta_values.shape)
print(X.shape, end='\n\n')
_ = np.dot(X, theta_values)
print(_.shape)
In [118]:
# define the hypothesis
def h(theta, X):
"""Takes the dot product of the matrix X and the vector theta,
yielding a predicted result.
"""
return np.dot(X, theta)
In [152]:
def compute_cost(X, y, theta):
"""Takes the design matrix X and output vector y, and computes the cost of
the parameters stored in the vector theta.
The dimensions must be as follows:
- X must be m x n
- y must be m x 1
- theta must be n x 1
"""
m = len(y)
J = 1 / (2*m) * np.dot((np.dot(X, theta) - y).T, (np.dot(X, theta) - y))
return J
In [156]:
# define column vector theta = [[0], [0]]
theta = np.zeros((2, 1))
In [159]:
# compute the cost function for our existing X and y, with our new theta vector
# verify that the cost for our theta of zeros is 32.07
compute_cost(X, y, theta)
Out[159]:
Now we'll actually implement the gradient descent algorithm. Keep in mind that the cost $J(\theta)$ is parameterized by the vector $\theta$, not $X$ and $y$. That is, we minimize $J(\theta)$ by changing $\theta$. We initialize the initial parameters to 0 and the learning rate alpha to 0.01.
In [271]:
def gradient_descent(X, y, theta, alpha, num_iters):
"""
"""
m = len(y)
J_history = []
theta_history = []
for i in range(num_iters):
J_history.append(float(compute_cost(X, y, theta)))
theta_history.append(theta)
theta = theta - (alpha / m) * np.dot(X.T, (np.dot(X, theta) - y))
return theta, J_history, theta_history
In [272]:
# set up some initial parameters for gradient descent
theta_initial = np.zeros((2, 1))
iterations = 1500
alpha = 0.01
In [273]:
theta_final, J_hist, theta_hist = gradient_descent(X, y,
theta_initial,
alpha, iterations)
After running the batch gradient descent algorithm, we can plot the convergence of $J(\theta)$ over the number of iterations.
In [237]:
def plot_cost_convergence(J_history):
abscissa = list(range(len(J_history)))
ordinate = J_history
plt.figure(figsize=(10, 6))
plt.plot(abscissa, ordinate, '.')
plt.title('Convergence of the Cost Function', fontsize=18)
plt.xlabel('Iteration Number', fontsize=14)
plt.ylabel('Cost Function', fontsize=14)
plt.xlim(min(abscissa) - max(abscissa) * 0.05, 1.05 * max(abscissa))
In [239]:
plot_cost_convergence(J_hist)
plt.ylim(4.3, 6.9)
Out[239]:
In [257]:
plot_data(df['Population'], df['Profit'])
x_min = min(df.Population)
x_max = max(df.Population)
abscissa = np.linspace(x_min, x_max, 50)
hypot = lambda x: theta_final[0] + theta_final[1] * x
ordinate = [hypot(x) for x in abscissa]
plt.plot(abscissa, ordinate, label='Hypothesis h(x) = {:.2f} + {:.2f}x'.format(
float(theta_final[0]), float(theta_final[1])), color='indianred')
plt.legend(loc=4, frameon=True)
Out[257]:
In [258]:
from mpl_toolkits.mplot3d import axes3d, Axes3D
from matplotlib import cm
In [269]:
fig = plt.figure(figsize=(12, 12))
ax = fig.gca(projection='3d')
theta_0_vals = np.linspace(-10, 10, 100)
theta_1_vals = np.linspace(-1, 4, 100)
theta1, theta2, cost = [], [], []
for t0 in theta_0_vals:
for t1 in theta_1_vals:
theta1.append(t0)
theta2.append(t1)
theta_array = np.array([[t0], [t1]])
cost.append(compute_cost(X, y, theta_array))
scat = ax.scatter(theta1, theta2, cost,
c=np.abs(cost), cmap=plt.get_cmap('rainbow'))
plt.xlabel(r'$\theta_0$', fontsize=24)
plt.ylabel(r'$\theta_1$', fontsize=24)
plt.title(r'Cost Function by $\theta_0$ and $\theta_1$', fontsize=24)
Out[269]:
In [290]:
theta_0_hist = [x[0] for x in theta_hist]
theta_1_hist = [x[1] for x in theta_hist]
theta_hist_end = len(theta_0_hist) - 1
In [297]:
fig = plt.figure(figsize=(12, 12))
ax = fig.gca(projection='3d')
theta_0_vals = np.linspace(-10, 10, 100)
theta_1_vals = np.linspace(-1, 4, 100)
theta1, theta2, cost = [], [], []
for t0 in theta_0_vals:
for t1 in theta_1_vals:
theta1.append(t0)
theta2.append(t1)
theta_array = np.array([[t0], [t1]])
cost.append(compute_cost(X, y, theta_array))
scat = ax.scatter(theta1, theta2, cost,
c=np.abs(cost), cmap=plt.get_cmap('rainbow'))
plt.plot(theta_0_hist, theta_1_hist, J_hist, 'r',
label='Cost Minimization Path')
plt.plot(theta_0_hist[0], theta_1_hist[0], J_hist[0], 'ro',
label='Cost Minimization Start')
plt.plot(theta_0_hist[theta_hist_end],
theta_1_hist[theta_hist_end],
J_hist[theta_hist_end], 'co', label='Cost Minimization Finish')
plt.xlabel(r'$\theta_0$', fontsize=24)
plt.ylabel(r'$\theta_1$', fontsize=24)
plt.title(r'Cost Function Minimization', fontsize=24)
plt.legend()
Out[297]:
In [ ]: